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Abstract — A causal rate distortion function with a general 
fidelity criterion is formulated on abstract alphabets and a coding 
theorem is derived. Existence of the minimizing kernel is shown 
using the topology of weak convergence of probability measures. 
The optimal reconstruction kernel is derived, which is causal, 
and certain properties of the causal rate distortion function are 
presented. 

I. INTRODUCTION 

Given a distortion or fidelity constraint between source se- 
quences = {Xi}°^Q e = x^Q^i and reproduction 

sequences Y°° = {Yi}°ZQ G 3^°^ = Xi^o^*' non-causal 
codes achieve the rate distortion function (RDF) of the source, 
which is the optimal performance theoretically attainable. 
The RDF is described in |1| for memoryless sources, in |2| 
for stationary ergodic sources, in |3| for information and 
distortion stable processes, and in [4| using the information 
spectrum method. The RDF for general sources on Polish 
spaces (complete separable metric spaces) and its properties 
are discussed extensively in [5]. 

Causal codes as defined in [6} are a sub-class of non- 
causal codes, with the addition constraint on the reproduction 
coder (cascade of encoder-decoder) such that Yi depends on 
the past and present source symbols {Xq, Xi, . . . , Xi} but 
not on the future symbols {X^+i, ■ • thus, Yi = 

fi{Xo, Xi, . . . , Xi) where {/i}^o measurable func- 
tions called reproduction coders. 

Causal codes are extensively analyzed in using entropy 
type criteria (entropy of reproduction coder), further inves- 
tigated in [7 1 where side information is present, while fS) 
consider stationary sources at high resolution. The rate loss 
due to causality for Gaussian stationary sources with memory 
and mean square distortion is analyzed in |9|, and recently in 

Eol. 

Zero-delay codes are a sub-class of causal codes, with the 
additional constraint on the reproduction coder such that the 
reproduction Yi is done at the same time the corresponding 
source symbol Xi is encoded, that is, both encoding and 
decoding are done causally. Sequential codes as defined in 
ifm and applied in lfT2l are causal zero-delay codes such that 
the reproduction of each source symbol is done sequentially 
following the time ordering Xq, Yq, Xi,Yi, .... 
The objective of this paper is to impose a causal constraint on 
the reproduction coder, and formulate the causal source coding 



problem with fidelity criterion via rate distortion theory, on 
general alphabets using the topology of weak convergence of 
probability measures. The results include the following. 

1) Information theoretic definition of causal rate distortion 
function as an optimization problem in which the re- 
production conditional distribution satisfies a causality 
constraint. 

2) Source coding theorem for directed information and dis- 
tortion stable processes. 

3) Expression of the optimal causal reconstruction distribu- 
tion and properties of the causal RDF. 

Causal Rate Distortion Function (CRDF). 

The precise definition of causal codes is stated below and it 
is found in 161. 

Definition 1.1: (Causal Reproduction Coder) A reproduc- 
tion coder is called causal if for alH < n 

fi{x") — fi{x") whenever = i* 

A source code is called causal if its induced reproduction coder 
is causal. 

From Definition 11.11 it follows that the reproduction coder 
is causal if and only if the following Markov chain holds 

x,+2, . . .) ^ {x\ r'-i) ^ y„ z = 0, 1, . . .. 

Assume an average distortion constraint 

n + 1 ^ — ' 

1=0 

where D >0, „(•, •) a non-negative distortion function. 
Consider causal reproduction coders defined in Definition ll.il 
Define the causal convolution of conditional distributions by 

Since a reproduction coder is causal if and only if the 
above Markov chain holds, then the reproduction conditional 
distribution of a causal coder satisfies 



(1) 



Substituting ([T]i into mutual information /(X"; F") it follows 
that for causal coders the information theoretic RDF for which 



an operational meaning will be saught, is given by 

1 



R''{D) = lim 



inf 



= lim inf ^— Ijf.^y.(Px.,i^y„|X") (2) 



where the joint distribution Px",Y"{dx",dy") for causal 
codes is uniquely defined by Px^\Y"{dx'^,dy^'') — 
Px^{dx"). Note that ^ is precisely 
the expression consider in 111] to derive coding the- 
orem for sequential codes. It is easy to verify that 
is the directed information from X" 

to y", /(X" ^ y") = Er=o^(^';^il^'"^)' subject to the 
requirement that the source is not affected by past reconstruc- 
tion symbols, that is, ®^^QPxi\X'-'^,Y'-^{dxi\x'^~^ ,y^~^) = 
<»?=aPx,\x^-^idx^\x'-'^) = However, if the 

causality constraint ^ is not imposed, the conditional dis- 
tribution i^yTi|X"((iy"|2;") in ^ should be replaced by 
Pyn|X" (dy^la;"), and the resulting expression is the classical 
RDF. Since, by the chain rule PY^\x"idy"\X" = x") = 

®?^oPY.\Y'-^=y"^,X^=x4dy^\Y'-' = X" = x"), 

in general the classical RDF solution yields reconstructions 
of Yi = yi which depends on future values of the source 
symbols (X^+i = Xi+i, . . . , Xn — a;„), in addition to its past 



reconstruction symbols 



, and past and present 



symbols = x*. On the other hand ^ implies causality. 

II. PROBLEM FORMULATION AND CODING 
THEOREMS 

Let N" = {0,1,..., n}, n e N = {0,1,2,...}. The 
source and reconstruction alphabets are sequences of Polish 
spaces [131 {Xt : t e E} and {^t : i G N}, respectively, 
(e.g., yt, Xt are complete separable metric spaces), associated 
with their corresponding measurable spaces {Xt,B{Xt)) and 
{yt,B{yt))- Sequences of alphabets are identified with the 

product spaces (A'o^n, fi(A'o,„)) = -Kl^Q{Xk,B{Xk)), and 
{yo,n,B{yQ.n)) = Xfc=o(3^fc,'B(>'fe)). The source and recon- 
struction are processes denoted by X" = {X„ : n e N"}, 

Xn e Xn, and by F" ^ {r„ : n G N"}, r„ G X., 
respectively. Probability measures on any measurable space 
{Z,B{Z)) are denoted by Mi{Z). It is assumed that the a- 
algebras a{X-^} = a{Y-^} = {0,r2}. 

Definition 2.1: Let (A", i3(A')), (J^, be measurable 

spaces in which 3^ is a Polish Space. 

A stochastic Kernel on y given A" is a mapping q : B{y) x 
X — [0, 1] satisfying the following two properties: 

1) For every x G X, the set function q{-; x) is a probability 
measure (possibly finitely additive) on B{y). 

2) For every F e B{y), the function q{F; •) is B{X)- 
measurable. 

The set of all such stochastic Kernels is denoted by Q{y; X). 



Stochastic kernels are classified into non-causal and causal 
as follows. 

Definition 2.2: Given measurable spaces {XQ,n,B{XQ^n)), 
(3^0. n, -6(3^0, n)), and their product spaces, data compression 
channels are defined as follows. 

1) A Non-Causal Data Compression Channel is a stochastic 
kernel go,™(dy"; a:") G Q(yo,„; A-q,™), n G N. 

2) A Causal Product Data Compression Channel is a 
convolution of a sequence of causal stochastic kernels 
defined by 

^o,„(rfy";a:") = ®^=o1^idy,-f-\x') 

where qi G Q(3^i; 3^o,i-i x Xn.i),i = 0, n G N. 

The set of such convolution of causal kernels is denoted by 
"^(3^0 Xq 

A. Information Theoretic Causal Rate Distortion Function 

This section gives the abstract formulation of R'^{D). Given 
a source probability measure /io.„ G A^i(Ao^„) and a recon- 
struction kernel „ G ^(3^o,ti; '^o,n) consistent with causal 
reproduction coder, define the following probability measures. 
PI: The joint measure Po.n G A^i(3'o,ra x Ao.„): 



't 0,niGa,n.x^; x"')^io,nidx"') 



where Ga,n,x" is the a;"— section of Go.n at point defined 

by Gn^n,x'- = {y" G 3^o,n : (a;",y") G Go,„} and ® denotes 
the convolution. 

P2: The marginal measure i/o,„ G Aii{yQ.n)- 

A'o,„ 

P3: The product measure 7ro,„ : i3(Ab,„) x i3(3^o,n) [0,1] 
of /Lto,„ G Xi(A'o,„) and z^o,n G Ali(3^o,n): 

7ro,n(Go,,i) = (A*0,n X t'Cri) (Go,„) , Go,„ G S(A'o,„) X B(3^o,n) 

(Go,„,j:")Aio,n(rfa;") 

The precise information measure used to define CRDF is the 
mutual information between two sequences of random pro- 
cesses X" and y" whose distributions are consistent with the 
definition of the causal reproduction coder, e.g., generated via 
P1-P3. Hence, by the construction of probability measures 
P1-P3, and the chain rule of relative entropy l,13J : 



/(X";r")=B(Po.n||7ro,„) 
log(^ 



(3) 



d{fio,n ® "^0,71) 



log( "'"M;. ' )l!oAdy-;x")noAd^n (4) 



i^o.nidy") 



(5) 



Note that ([5]) states that mutual information is expressed as a 
functional of {^cn, "o^o.nl denoted by (^o,n, "^o,n)- 

Also, if the causality assumption on the reproduction coder is 
not imposed, then — I{fJ,o.n,qo,n), which is how 

classical RDF is defined. 

The next lemma gives equivalent statements which are 
consistent with causal reproduction coders in terms of causal 
convolution of reconstruction kernels, mutual information, 
directed information, and conditional independence. 

Lemma 2.3: The following are equivalent for each n G N. 

1) go,n(d2/";a;") = 'g o,„('iy";x") a.s., where "qo „ is 
given in Definition 12.21 -2). 

2) For each i = 0, - 1, <^ {X\Y'-^) ^ 
{Xi+i, ■ ■ ■ , Xn), forms a Markov chain. 

3) /(X";r") = /(X" ^ y"). 

4) For each i = 0, 1, . . . , n - 1, ^ X* ^ Xj+i forms 
a Markov chain. 

Proof: The prove is omitted due to space limitation. ■ 
- /(X" ^ y") = Ix"^yn(Mo,«,^o,«) is 
a functional of {/^o.nj^on}- Hence, the information defi- 
nition of a causal rate distortion is defined by optimizing 
I(/^o,Ti, ^o.n) over „ which satisfies a distortion constraint. 

Definition 2.4: (Causal Information Rate Distortion Func- 
tion) Suppose (io,„(a;",y") = ELo P^l^^N 2/*), where 
Pi : X yi — ;> [0, oo), is a sequence of B{Xi) x B{yi)- 
measurable distortion functions, and let ~(^o,n{D) (assuming is 
non-empty) denotes the average distortion or fidelity constraint 
defined by 



< D 



D>0 



A 



(6) 



Define 



to, 



inf 



(D) 



lx^^Y^{^^0,n,~^Q„) (7) 



A 



The operational meaning of CRDF is established via R'^ {D) 
lim„_j.oo ^0 ri(-^)' provided the limit exists. 
Clearly, Rq „ (D) is characterized by minimizing 
Ix'i-^yi (/io,Tn n) ^^^^ '■^^ causal convolution measure 

B. Coding Theorems for Causal and Sequential Codes 

This section gives an operational meaning to Rq ,^{D) via 
coding theorems. There are two cases, sequential codes and 
causal codes. 

Sequential Codes. Coding theorems for sequential codes 
are established in [JJJ for the finite alphabet case, 

and two-dimensional source X'^'^ = {Xt,n '■ t — 
0, . . . ,T,n = 0, . . . , A^}, where t represents time index 
and n represents spatial index, under the assumption that 



P{X^'^) = <=om,TX and {X^ : n = 0,...,N} 
are identically distributed, and the distortion constraint is 

ExT.N{jf^Y.n=oPi^t,n,yt,n) < Dt, t = 0,1,..., T}. 

With a slight modification of the per-letter distortion function 
above, it can be shown that the coding theorem in (TT\ is still 
valid, and that the corresponding sequential RDF is given by 
R^^^{D) = R'^{D). The coding theorem is derived using 
strong typicality. 

Causal Codes. Here we describe a coding theorem for causal 
codes. 

Definition 2.5: (Causal Code) A (n, , D) causal source 
code of block length n, and rate R consists of an encod- 



A 



ing mapping e(-), e : XQ.n — > W = {1,2,. 
and a sequence of decoder mapping {.gi}"=o(')' 9i ■ 
{1,2,..., 2"^} — > y„ i = 0,1,..., n such that the se- 
quence of reproduction coders {fi = gi o e}"^Q are causal. 

Definition 2.6: (Achievable Rate) A rate distortion pair 
{R, D) is called achievable if Ve > and sufficiently large 
n there exists a (n, 2"'^,D) causal code such that 



1 



<D + e 



Definition 2.7: (Causal Rate Distortion Function) The 
CRDF R{D) is the infimum of rates R such that {R, D) is 
achievable. 

The definition of the coding theorem can be done for i) 
stationary ergodic processes {^{Xi,Yi) : i — 0,1,...} by 
invoking versions of Shannon-McMillan-Breimann Theorem, 
ii) for information and distortion stable processes by invoking 
versions of Dobrushin's conditions, and iii) for processes with 
information spectrum via variants of the methods in |4|. 
Here we discuss ii) since the distortion function rfo,n(a;",y") 
is general and does not fall under the special case discussed 
in [|2|, Section 9.8] for ergodic sources. 
Define the information density consistent with the causal 

reproduction coder by Ao,„(a::", y") = log ^-73^r(^^^ 

where it is assumed absolute continuity 1^ y^ix^i'l^") ^ 
Py>i(-), /io.n — a.s. for almost all x" G Xon- Then 
Ix"^y"(Px",^y"|X") = E{Ao,n{x",y")}, where the 
joint distribution is Px^\Y" — Px'^ ® Py"\X"- 

Definition 2.8: (Information and Distortion Stable) For 
each e > define the e-typical set of directed information 
density 



1 



1 

log- 



n+1 



-I 



n + r 



and the e-typical set of the distortion by 



2?(") = \ {x",yneXo,nxy„^n 
1 



1 



71+1 



-i?{do,„(x",2/")} 



< e 



The process {(X„,y„) : n e N} is called directed informa- 
tion and distortion stable if lim„^oo Prob{Te 



1, and 



lim„_>oo Prob{V, 



in) 



1, respectively, for every e > 0. 



Note that for stationary ergodic process {(Xji,y„) : ?i e N} 
and certain distortion functions (see ||2l, Section 9.8) infor- 
mation and distortion stability follows. Before the statements 
leading to coding theorem are introduced, the notion of 
stability of the source is required. 

Definition 2.9: The source {X„ : n e N} is called stable 
if for any given D > Q and e > there exists {y„ : n £ N} 
such that {{Xn,Yn) : n G N} is directed information and 
distortion stable, and 



lim 



lim — 

n— foo 71 



— £;{do,4a;",2/")} 

-A'^' pIuI\ }^^(^) 



(8) 



(9) 



where E{-} is with respect to Px",y" — ~^Y"\X" ® Px"- 
Note that by specializing (io.ri(a;", y") to distortion functions 
that satisfy sub-additivity property the limit in (O exists. 
Utilizing Definition 12.91 it can be shown that the following 
statements hold, which are vital in establishing the coding 
theorem. 

Lemma 2.10: Assume {Xn : n g N} is stable and the 
joint distribution Px",Y" is defined by Px" ,Y"{dx" , dy") = 

Then 

1) lim„^oo Px",y"(7;^"^) = lim„^oo = 1 

2) For sufficiently large n, there exists e > such that 



< 2 



n(lx"^y..(Px":^^l'"|X")+3£) 



Px^idx") 

Using Lemma 12.101 the source coding theorem stated below 
can be established. 

Theorem 2.11: (Source Coding Theorem) Assume {Xn ■ 
n e N} is stable and sup(^i j^,)^;^^,^ ^-y;^ . pi{x\y^) < k, k < 
oo for all i. If i? > R'^{D) then for any 6 > and sufficiently 
large n, there exists an {n, 2"^, D) causal code which satis- 
fies the average distortion -^Ep^„ y„ { ;^(io,„(x", y")} < 
D + S. 

Proof: The derivation utilizes Lemma 12.101 and random 
codebook generation. Fix ~I^Y"\X"idy"-\x'"-), which achieves 
the equahty in R^iD) (e.g., (Hi). Calculate ^^.(dy") = 
J^^ ~^ Y^\x^{dy^\x^)Px"{dx"'). Randomly generate rate 
distortion codebook C of 2"^ sequences F" according to 
'^Yn(dy") and reveal the codebook to encoder and decoder. 
Utilizing Lemma [2.1 01 and Definition 12. 91 the result is obtained 
following [|3j. ■ 

III. EXISTENCE OF OPTIMAL CAUSAL 
RECONSTRUCTION 

In this section, the existence of the minimizing causal 
product kernel in (I?]) is shown by using the topology of 
weak convergence of probability measures on Polish spaces. 



The only assumptions required are 1) 3^o,ri is a compact 
Polish space, 2) Ao_„ is a Polish space, and 3) rfo,n(a;",-) 
is continuous on 3^o,n- 

A. Weak Compactness and Existence of Optimal Reconstruc- 
tion Kernel 

Define the family of measures 

= <s>7=o(l^{dy^;y'~\x')} 

Lemma 3.1: Let 3^o,ri be a compact Polish space and Xo^n 
a Polish space. 
Then 

1) The family of measures "^q x") e "^(D^o.n; -^o.n) 
is compact. 

2) Under the assumption that do^nix",-) is continuous 
in 3^0, n the set ~C^q^{D) is a closed subset of 

Proof: 1) This follows from the fact that 

any 'to,nidy";x"') e ^(>'o,n;'%'o,n) is factorized 
as '^o,n(^^y";2^") = ®?=o9»(%M2^*^^2^')' where 
qi{dy,;y'-'^,x') £ Q{yi;yo,t-i x Xo,i), I < i < n, 
and 3^0, n compact Polish space implies that 
{qi{-;y'^'^,x'') : y'^'^ £ yo,i-i,x' e -^o.J is compact. 
Utilizing this, by induction it can be shown that the family 
of convolution measures ^(3^o,n; -^o.n) is compact. 
2) Utilizing compactness of Q(3^o,n; -^o.n) and the assumption 
on fio,n(a;",-) it can be shown that Qq„(_D) is a closed 

subset of ~^{yo,n; Xo^n)- ■ 
The next theorem establishes existence of the minimizing 
reconstruction kernel for (|7]l. 

Theorem 3.2: Suppose 3^o,n is compact Polish space and 
do.nix"^, •) is continuous in 3^o,ji- Then Rq ^{D) has a mini- 
mum. 

Proof: The assumptions are sufficient to show lower 
semicontinuity of the functional Ixn^F" (/io.n, n) ^i'^h 
respect to foi" a fixed /io.n. Moreover, by Lemma [TT] 

2) since '^o.n{D) is a closed subset of a compact set 
Qiyo.n', Xqji), then ~^Q.n{D) is also compact. By Weiestrass 
theorem existence follows. ■ 

IV. OPTIMAL CAUSAL RECONSTRUCTION 

In this section the form of the optimal causal reconstruction 
kernel is derived and the properties of i?Q „(-D) are discussed 
under a stationarity assumption. 

A. Optimal Reconstruction 

Assumption 4.1: The family of measures that admits the 
factorization l}{dy^\x'^) = ®'i=Qqi{dyi\y'^~^ , x'^) is the con- 
volution of stationary conditional distributions. 
Assumption 14.11 holds for stationary ergodic process 
{{Xi, Yi) : i e N} and pi{x\ y'), which is stationary and time- 
invariant Vi. The method is based on calculus of variations 



on the space of measures lfT4ll . Utilizing Assumption 14.11 
which holds for stationary ergodic processes {{Xi,Yi) : i ~ 
0, 1, . . . , n} and single letter distortion function or distortion 
function discussed in [||2l, Section 9.8], the Gateaux differ- 
ential of (/io,rn "^0 n) donc in only one direction 
(since qi{dyi;y^~^ , x^) are stationary). This simplifies the 
calculations of Gateaux derivative of Ia'"^!'" (/^o,n, «)■ 

Theorem 4.2: Suppose iMonC^'on) = 

Ix"->y" (a^o,™, "^Cn) is well defined for every 
^0 n S Qi3,n{D) possibly taking values from the set 
[0,oo). Then g q « ^ IponC^O n) Gateaux differentiable 
at every point in Qo,n(-D), and the Gateaux derivative at the 
point „ in the direction "^q „ — "^q „ is given by 



O.i 



(10) 



Proof: The fully unconstraint problem of (fTTT i is obtained 
by introducing another set of Lagrange multipliers. Using this 
and Theorem 14.21 we obtain ( fT2] i and ( fTsT l. ■ 
Note that according to Assumption 14.11 the terms appear in 
the right side of ( fT2b are identical. 

B. PROPERTIES OF R^ ,^{D) 

In this section, we present some important properties of the 
CRDF as it is defined in ©. 
Theorem 4.4: 

1) Rq „(-D) is a convex, non-increasing function of D 

2) If Pi G L'^iiTi) then 

b) Rq^{D) is non-increasing for D G [0,Dmax] where 
Dmax = TITlT,7=o^^^(P^) Rx),ni^) = ^ ^^v any 

D ^ ^max 

3) Ro^niD) > for all D < Dmax and R^^^iD) = for 
all D > Dmax, where 



where i^q ^ ^ -^i(3^o,n) is the marginal measure correspond- 
ing to 

0,n MO,n 

Proof: The proof utilizes Assumption 14.11 ■ 
The constrained problem defined by ^ can be reformu- 
lated using Lagrange multipliers as follows (equivalence of 
constrained and unconstrained problems follows from 1 14|). 



Dr. 



mm 



1 " /■ 



-s{e{to,n)-D)} (11) 

and s G (— oo,0] is the Lagrange multiplier. 
Note that ^(J^o.n! -^o.n) represents the causality constraint 
set. Therefore, one should introduce another set of Lagrange 
multipliers to obtain an optimization without constraints. This 
process is involved hence we state the main results. 

Theorem 4.3: Suppose Assumption 14. II and do,n{x", y") — 
^"^Q y*) hold. The infimum in (ITT]) is attained at 
"^o.n e Qo,n{D) given by 



and v*{dyi;y^ ^) G Q(3^i; 3^o,i-i)- The causal rate distortion 
function is given by 



Ids 



spiix\,y') */ 



yi 



n 

^ojD) ^ sD - 

<{dy^]y"^^))~^l,^-l{df^^]x'-^)®^lo,^{dx') 

(13) 

If i?o,„(£') > then s < and 

^ Pi{x\y')'tlMv''^x')l^oAdx') = D 



'^ + ^t^o-^^o..Jyo, 



if such a minimum exists. 

Proof: Omitted due to space limitation. ■ 

V. CONCLUSION 

The solution of the CRDF subject to a reconstruction kernel 
which is a convolution of causal kernels is presented, on ab- 
stract alphabets. Some of its properties are also presented. It is 
believed that the optimal reconstruction kernel as a convolution 
of causal kernels has several implications in applications where 
causality of the decoder as a function of the source is of 
concern. Specific example by invoking (flTI) will be part of 
the final paper. 
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